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An  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  Review  Panel,  with  exper¬ 
tise  in  personnel  selection,  job  classification,  psychometrics,  and  cognitive  psychology 
developed  recommendations  for  changes  to  the  military  enlistment  test  battery.  One 
recommendation  was  to  develop  and  evaluate  a  test  of  cyber/information  and  commu¬ 
nications  technology  literacy  to  supplement  current  ASVAB  content.  This  article 
summarizes  a  multiphased  Cyber  Test  development  process:  (a)  a  review  of  informa¬ 
tion/computer  technology  literacy  definitions  and  measures,  (b)  development  and  pilot 
testing  of  a  cyber  knowledge  measure,  (c)  validation  of  test  scores  against  final  school 
grades  (FSGs)  for  selected  technical  training  courses,  (d)  development  of  an  operational 
reporting  metric  and  subgroup  norms,  and  (e)  examination  of  construct  validity.  Results 
indicate  the  Cyber  Test  has  predictive  validity  versus  technical  training  school  grades 
and  incremental  validity  comparable  to  the  ASVAB  technical  knowledge  tests  when 
used  with  the  ASVAB  Armed  Forces  Qualification  Test  (AFQT)  verbaPmath  compos¬ 
ite  as  a  baseline. 

Keywords:  cyber,  information  and  communications  technology,  technical  knowledge,  selection 
and  classification.  Armed  Services  Vocational  Aptitude  Battery 


The  use  of  computer  and  information  tech¬ 
nology  (IT)  is  pervasive  in  modern  society.  It 
affects  all  aspects  of  everyday  life  including 
commerce,  communications,  finance,  govern¬ 
ment,  military,  transportation,  utilities,  and 
others.  While  the  increased  use  of  computer 
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and  IT  have  contributed  to  greater  efficiency 
and  cost  savings,  it  also  has  led  to  increased 
vulnerability  (e.g.,  information  security,  ma¬ 
licious  intent,  and  theft).  Over  the  last  decade, 
computer  and  network  security  and  vulnera¬ 
bility  issues  have  increased  dramatically  in 
importance.  A  National  Academy  of  Science 
(National  Research  Council,  2002)  report  em¬ 
phasized  the  importance  of  cyber  security  in 
the  wake  of  9/11. 

In  the  military,  computer  and  IT  are  inte¬ 
gral  to  the  concept  of  net-centric  operations. 
The  objective  of  net-centric  operations  is  to 
leverage  an  information  advantage  enabled  in 
part  by  IT,  into  a  competitive  advantage 
through  the  networking  of  geographically  dis¬ 
persed  forces.  A  strong,  effective  IT  network 
improves  information  sharing  which  enhances 
the  quality  of  information  and  shared  situa¬ 
tional  awareness.  Shared  situational  aware- 
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ness,  in  turn,  enhances  collaboration  and  syn¬ 
chronization  of  activities,  speed  of  command, 
and  overall  mission  effectiveness. 

In  2006,  the  U.S.  Air  Force  announced  that 
cyberspace  would  constitute  a  new  mission  do¬ 
main  and  in  2010  the  Department  of  Defense 
(DoD)  announced  the  establishment  of  the  U.S. 
Cyber  Command  (McMichael,  2010)  that  was 
tasked  to  coordinate  offensive  and  defensive 
cyber-related  activities.  Competition  among  in¬ 
dustry,  the  government,  and  military  for  high 
quality  cyber/IT  personnel  is  great  (Gould, 
2013). 

Selecting  the  Right  People  for  Military 
Cyher  Training 

In  2005,  the  Defense  Manpower  Data  Cen¬ 
ter  (DMDC)  initiated  an  Armed  Services  Vo¬ 
cational  Aptitude  Battery  (ASVAB)'  review 
process  at  the  request  of  accession  policy 
(Office  of  the  Assistant  Secretary  of  Defense). 
Factors  driving  the  review  included  concerns 
that  current  ASVAB  content  was  dated  and 
the  perceived  potential  of  new  measures  to 
increase  its  predictive  validity  and  classifica¬ 
tion  efficiency.  An  expert  review  panel  was 
convened  to  consider  the  current  status  of  the 
ASVAB  program  and  make  recommendations 
for  improvements.  To  this  end,  the  ASVAB 
Review  Panel  (ARP)  was  briefed  at  three 
meetings  in  2005  by  military  personnel,  tech¬ 
nical  and  policy  experts  from  the  Services  and 
DMDC.  The  briefings  included  information 
regarding  test  development  (item  specifica¬ 
tions,  development,  and  evaluation),  current 
ASVAB  use  (psychometric  properties,  valid¬ 
ity,  and  classification  efficiency),  supplemen¬ 
tal  measures  used  by  the  Services  (ability, 
temperament),  and  job  analysis  methods  (and 
their  relations  to  test  content).  The  ARP  pre¬ 
sented  its  findings  (Drasgow,  Embretson, 
Kyllonen,  &  Schmidt,  2006)  in  March,  2006 
that  included  22  recommendations  grouped 
into  hve  broad  areas;  (a)  content  specifica¬ 
tions,  (b)  test  development  and  administra¬ 
tion,  (c)  content  changes,  (d)  development  of 
a  standardized  validation  and  performance 
database,  and  (e)  English  language  profi¬ 
ciency  and  its  effect  on  test  scores. 

Proposed  content  changes  included  the  devel¬ 
opment  and  evaluation  of  measures  of  noncog- 
nitive  characteristics,  nonverbal  reasoning,  and 


information/communications  technology  liter¬ 
acy  (ICTL).  The  Air  Eorce  took  the  lead  on  the 
development  of  a  cyber/ICTL  measure  as  Air 
Eorce  leadership  had  identified  cyberspace  op¬ 
erations  as  a  critical  and  major  growth  area.  The 
ARP  speculated  that  an  updated  technical 
knowledge  test  along  the  lines  of  the  ASVAB 
Electronic  Information  test  might  improve  pre¬ 
dictive  validity  and  classification  efficiency. 
This  recommendation  is  consistent  with  a  2006 
report  by  the  National  Academy  of  Engineering 
and  the  National  Research  Council  regarding 
technological  literacy  (Garmire  &  Pearson, 
2006). 

A  series  of  studies  was  conducted  with  the 
goal  of  development  and  psychometric  evalua¬ 
tion  of  a  cyber/ICTL  test.  These  were;  (a)  liter¬ 
ature  review  of  ICTL  (hereafter  referred  to  as 
cyber  knowledge)  definitions  and  measures,  (b) 
development  and  pilot  testing  of  a  cyber  knowl¬ 
edge  measure,  (c)  validation  of  cyber  knowl¬ 
edge  test  scores  against  final  school  grades  for 
selected  technical  training  courses,  (d)  develop¬ 
ment  of  subgroup  norms,  and  (e)  examination  of 
construct  validity.  The  following  sections  sum¬ 
marize  each  of  these  studies. 

Concept  Definition  and  Initial 
Test  Development 

Literature  Review: 

Definitions  and  Measures 

The  DoD  sponsored  a  literature  review  on 
cyber/IT  measurement  (Russell  &  Sellman, 
2007,  2008b)  where  the  specific  objectives  were 
to  develop  a  working  definition  based  on  prior 
research  and  to  identify  and  review  existing 
tests.  To  arrive  at  a  working  definition  of  cyber/ 
IT,  taxonomies  of  information  and  computer 
literacy  concepts  developed  by  the  National  Re¬ 
search  Council  and  others  were  reviewed  and 


'  ASVAB  tests  include  Arithmetic  Reasoning  (AR),  As- 
sembling  Objects  (AO),  Auto  and  Shop  Information  (AS), 
Electronics  Information  (El),  General  Science  (GS),  Math 
Knowledge  (MK),  Mechanical  Comprehension  (MC),  Para¬ 
graph  Comprehension  (PC),  and  Word  Knowledge  (WK). 
The  verbal  tests  (PC  and  WK)  are  combined  into  a  verbal 
(VE)  composite.  VE  and  the  math  tests  (AR  and  MK)  are 
combined  into  the  Armed  Forces  Qualification  Test  (AFQT) 
composite,  which  is  used  by  all  U.S.  military  Services  for 
enlistment  qualification.  Each  Service  develops  its  own 
composites  to  qualify  applicants  for  technical  training. 
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compared.  The  resulting  working  definition 
contained  seven  common  elements  across  the 
taxonomies;  (a)  using  computers  (basic),  (b) 
communicating,  (c)  gathering  information,  (d) 
using  information  technology  (IT)  tools  and  re¬ 
sources,  (e)  using  networks,  (f)  programming, 
and  (g)  taking  the  broad  view.  The  literature 
search  also  revealed  several  cyber/IT  measures. 
Russell  and  Sellman  (2007)  compared  existing 
cyber/IT  measures  against  the  working  defini¬ 
tion  of  cyber/IT  literacy  and  several  technical 
criteria  and  concluded  that  none  of  them  cov¬ 
ered  all  aspects  of  the  working  definition.  None¬ 
theless,  several  of  the  measures  demonstrated 
useful  testing  approaches  and  unique  item 
types. 

The  initial  cyber/IT  literacy  working  defini¬ 
tion  was  based  entirely  on  literature  and  existing 
definitions,  primarily  civilian  in  nature.  Russell 
and  Sellman  (2007)  recommended  that  the  cy¬ 
ber/IT  literacy  requirements  of  military  jobs  be 
integrated  with  the  working  definition  to  focus 
content  development. 

Development  and  Pilot  Testing  of  a 
Cyber  Knowledge  Test 

Identification  of  knowledge,  skills,  abilities, 
and  other  characteristics  for  measurement. 

Once  a  working  definition  of  cyber/IT  literacy 
was  developed,  the  next  step  was  to  create  a 
taxonomy  of  knowledge,  skills,  abilities,  and 
other  characteristics  (KSAOs)  required  for  suc¬ 
cessful  performance  in  cyber/IT  occupations. 
The  taxonomy  was  used  to  create  Cyber  Test 
(CT)  content  specifications.  Activities  included 
(a)  a  review  and  integration  of  existing  taxono¬ 
mies,  (b)  interviews  with  military  cyber/IT  sub¬ 
ject  matter  experts  (SMEs),  and  (c)  an  online 
survey  of  additional  military  IT  SMEs  to  eval¬ 
uate  and  modify  the  initial  taxonomy. 

Review  and  integration  of  existing 
taxonomies.  Several  sources  were  reviewed 
to  identify  a  set  of  KSAOs  for  measurement. 
These  included  the  National  Workforce  Center 
for  Emerging  Technologies  Web  site,  which 
contains  industry-derived  skills  standards  for 
IT,  knowledge  base  categories  from  an  IT  pub¬ 
lication  focused  on  IT  managers  (Computer- 
world. com),  and  occupational  information  (ed¬ 
ucation  and  training  plans)  for  cyber/IT-related 
career  fields  for  the  Air  Eorce,  Army,  and  Navy. 
The  resulting  taxonomy  consisted  of  79  specific 


knowledge  statements  organized  into  four  broad 
areas:  (a)  networking  and  telecommunications, 
(b)  computer  operations,  (c)  security  and  com¬ 
pliance,  and  (d)  software  programming  and 
Web  design. 

There  were  two  main  concerns  with  the  orig¬ 
inal  knowledge  taxonomy.  The  first  was  that  it 
was  civilian-centric.  The  second  was  that  it  was 
not  known  if  the  KSAOs  were  entry-level  or 
more  appropriate  for  advanced  positions.  To 
address  these  issues,  military  cyber/IT  SMEs 
were  recruited  to  review  and  modify  the  taxon¬ 
omy  to  make  it  more  appropriate  for  qualifying 
military  applicants  for  entry-level  technical 
training. 

Interviews  with  military  SMEs.  Seventy- 
two  cyber/IT  SMEs  from  the  Air  Eorce  (31), 
Army  (3),  and  Navy  (38)  were  interviewed  by 
phone  or  face-to-face  in  small  groups  about  the 
initial  taxonomy.  It  was  explained  that  the  ob¬ 
jective  was  to  develop  an  entry-level  technical 
knowledge  test  that  could  be  administered  as  a 
part  of  the  ASVAB.  SMEs  were  asked  which 
knowledge  statements  were  entry-level.  They 
also  were  asked  to  add  new  knowledge  state¬ 
ments  they  thought  were  important  and  to  make 
wording  changes  as  needed.  Einally,  SMEs 
were  shown  some  examples  of  different  types  of 
test  items  and  asked  for  ideas  about  potential 
test  item  types.  The  revised  taxonomy,  summa¬ 
rized  in  Table  1,  consisted  of  39  knowledge 
statements. 

It  became  apparent  during  the  SME  inter¬ 
views  that  they  viewed  basic  abilities,  particu¬ 
larly  reasoning,  as  important  for  success  in  IT  or 
cyber-related  training.  With  this  in  mind,  we 
reviewed  two  well-known  individual  differ¬ 
ences  taxonomies  (Carroll,  1993;  Eleishman, 
Costanza,  &  Marshall-Mies,  1999)  and  defined 
abilities  thought  to  be  important  for  IT  and 
cyber-related  occupations.  Drafts  of  the  abilities 
list  were  discussed  with  SMEs  over  the  course 
of  the  interviews  to  determine  occupational  rel¬ 
evance.  The  final  list  of  12  abilities  appears  in 
Table  2. 

Online  SME  survey.  An  online  survey  was 
administered  to  gather  data  from  SMEs  on  the 
cyber/IT  knowledge  and  abilities  identified  in 
the  previously  described  steps.  Thirteen  Air 
Eorce  and  37  Navy  SMEs  completed  the  survey, 
which  had  four  parts.  Part  1  collected  partici¬ 
pant  background  data.  In  Part  2,  SMEs  made 
judgments  about  each  of  the  39  statements  in 
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Table  1 

IT  Career  Clusters  and  Core  Skills 


Broad  area 

IT  cluster 

Example  of  specific  knowledge  statement 

Networking  and 

•  Network  communications  and 

•  Knowledge  of  network  protocols  and  standai'ds 

Communications 

maintenance 

•  Telecommunications 

•  Knowledge  of  telecommunications  topologies 

Computer  Operations 

•  PC  configuration  and 
maintenance 

•  Knowledge  of  file  structure 

•  Using  IT  tools/software 

•  Knowledge  of  features  and  general  uses  of 
word  processing  software 

Security  and 

•  System  security 

•  Knowledge  of  security  methodologies  for 

Compliance 

routing  devices 

•  Offensive  methods 

•  Knowledge  of  encryption  and  decryption 
methods 

Software  Programming 

•  Software  programming 

•  Knowledge  of  basic  language  constructs 

and  Web  Design 

•  Database  development  and 
administration 

•  Knowledge  of  database  querying  methods 

•  Web  development 

•  Knowledge  of  web-based  data  environments 

•  Data  formats 

•  Understanding  the  differences  between  data 
formats 

•  Numbering  systems 

•  Understanding  the  different  numbering  systems 
such  as  hex  and  binary 

the  final  knowledge  taxonomy.  They  were 
asked  to  indicate  whether  the  knowledge  was 
basic  or  advanced,  rate  its  importance,  and  the 
likelihood  that  the  knowledge  will  change  in  the 
future.  These  three  judgments  were  designed  to 
help  identify  important,  stable,  basic  knowledge 
areas  that  were  good  candidates  for  measure¬ 


ment  on  the  Cyber  Test.  In  Part  3,  SMEs  were 
asked  to  imagine  that  they  were  creating  a  test 
and  to  indicate  how  many  items  should  be  dis¬ 
tributed  across  the  four  broad  knowledge  areas. 
Finally,  in  Part  4  SMEs  rated  the  importance 
of  the  12  abilities  (see  Table  2).  The  purpose  of 
this  part  was  to  document  the  importance  of 


Table  2 

Definitions  of  Abilities  for  Cyber/IT  Occupations 


Ability 


Definition 


Verbal  reasoning 
Nonverbal  reasoning 

Mathematical  reasoning 

Problem  sensitivity 

Originality 

Information  ordering 


Written  communication 
Oral  comprehension 

Perceptual  speed 

Advanced  written 
comprehension 
Written  expression 
Near  vision 


Ability  to  solve  verbal/word  problems  by  reasoning  logically 

Ability  to  solve  nonverbal  problems  (graphical,  puzzles,  and  diagrammatic)  by 
reasoning  logically 

Ability  to  reason  mathematically  and  choose  the  right  mathematical  methods  or 
formulas  to  solve  a  problem 

Ability  to  tell  when  something  is  wrong  or  is  likely  to  go  wrong.  It  does  not 
involve  solving  the  problem,  only  recognizing  there  is  a  problem. 

Ability  to  come  up  with  unusual  or  cleaver  ideas  about  a  given  topic  or  situation 
or  to  develop  creative  ways  to  solve  a  problem. 

Ability  to  arrange  things  or  actions  in  a  certain  order  or  pattern  according  to  a 
specific  rule  or  set  of  rules  (e.g.,  patterns  of  numbers,  letters,  words,  pictures, 
mathematical  operations) 

Ability  to  read  and  understand  information  and  ideas  presented  in  writing 

Ability  to  listen  to  and  understand  information  and  ideas  presented  through 
spoken  words  and  sentences 

Ability  to  quickly  and  accurately  compare  similaiities  and  differences  among  sets 
of  letters,  numbers,  objects,  pictures,  or  patterns 

Ability  to  read  and  understand  technical  and/or  government  documents 

Ability  to  communicate  information  and  ideas  in  writing  so  others  will  understand 

Ability  to  see  details  at  close  range  (within  a  few  feet  of  the  observer) 
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abilities  that  might  be  measured  by  the  CT  or  by 
other  ASVAB  tests. 

SMEs  considered  all  four  broad  cyber/IT 
knowledge  areas  to  be  important.  They  indicated 
that  most  of  the  items  should  focus  on  Networking 
and  Telecommunications  (29.4%),  Computer  Op¬ 
erations  (28.3%),  and  Security  and  Compliance 
(27.0%),  with  less  emphasis  on  Software  Pro¬ 
gramming  and  Web  Design  (15.2%).  SMEs  con¬ 
sidered  most  of  the  Computer  Operations  knowl¬ 
edge  statements  (74.0%)  to  be  entry-level,  with 
smaller  percentages  attributed  to  Networking  and 
Telecommunications  (48.0%),  Security  and  Com- 
phance  (29.6%),  and  Software  Programming  and 
Web  Design  (30.6%). 

SMEs  rated  nearly  all  of  the  12  abilities  as 
very  important.  The  communications  related 
abilities  (Written  Comprehension,  Advanced 
Written  Comprehension,  Written  Expression, 
and  Oral  Comprehension)  held  four  of  the  top 
hve  ratings  of  importance.  These  results  sug¬ 
gested  that  IT  and  cyber-related  jobs  are  very 
cognitively  demanding  and  that  it  may  be  useful 
to  expand  the  coverage  of  communications 
skills  in  predictors  of  cyber  related  occupations. 

Development  of  an  initial  experimental 
item  pool.  Once  the  KSAOs  to  be  measured 
were  dehned,  attention  turned  to  identifying 
item  types  and  measurement  methods.  Although 
several  item  types  were  considered,  including 
information/knowledge,  logic-based  reasoning, 
situational  judgment,  nonverbal  reasoning,  sce¬ 
nario/stimulus-based,  and  biographical  data, 
there  were  three  important  constraints.  Eirst,  the 
items  needed  to  have  a  format  that  would  be 
consistent  with  other  ASVAB  items  and  capa¬ 
ble  of  being  administered  on  the  CAT- ASVAB 
platform.^  Second,  the  new  test  needed  to  be 
relatively  short  and  efficient,  ultimately  about 
20  min  in  length  for  the  operational  form.  The 
hrst  two  constraints  virtually  dictated  a  selected 
response  test.  The  third  constraint  was  that  the 
new  test  needed  to  provide  incremental  validity 
beyond  that  provided  by  the  ASVAB.  There  is  a 
wealth  of  evidence  that  the  ASVAB  is  a  good 
measure  of  cognitive  aptitude  for  a  number  of 
constructs  such  as  mathematical  and  verbal  ap¬ 
titude.  This  meant  that  the  new  test  needed  to 
focus  on  KSAOs  not  already  tapped  by  the 
ASVAB. 

Based  on  discussions  with  the  cyber/IT 
SMEs,  it  was  decided  to  focus  the  experimental 
item  pool  on  information  or  knowledge,  logic 


based  reasoning,  and  biographical  data  items. 
Information  tests  were  among  the  most  success¬ 
ful  and  most  highly  valid  printed  classihcation 
tests  created  by  the  Army  Air  Eorces  (AAF) 
Aviation  Psychology  Program  during  World 
War  II.  Guilford  and  Lacey  (1947)  saw  infor¬ 
mation  tests  as  maximal  performance  interest 
measures.  That  is,  information  tests  are  thought 
to  be  indirect  measures  of  interest,  motivation, 
aptitude,  and  skill  in  a  particular  area.  More¬ 
over,  they  are  not  intended  to  certify  an  indi¬ 
vidual  at  a  particular  skill  level  or  identify  who 
does  not  need  training.  Rather,  they  are  de¬ 
signed  to  assess  knowledge  and  skill  at  a  very 
general  level  whereas  also  providing  an  objec¬ 
tive  measure  of  interest  and  motivation  in  a 
technical  content  area.  Knowledge  or  informa¬ 
tion  tests  continue  to  serve  military  selection 
and  classihcation  well  today.  The  ASVAB  Gen¬ 
eral  Science,  Electronics  Information,  and  Auto 
and  Shop  Information  tests  are  all  measures  of 
technical  knowledge  or  information  in  their  re¬ 
spective  content  domains.  For  these  reasons, 
technical  knowledge  items  in  the  cyber/IT 
knowledge  domain  were  expected  to  be  good 
candidates  for  inclusion  on  the  cyber/IT  apti¬ 
tude  test.  After  decades  of  use,  they  have  proven 
successful  for  use  in  military  selection  and  clas¬ 
sihcation  (Oppler,  Russell,  Rosse,  Keil, 
Meiman,  &  Welsh,  1997).  We  concluded  that 
information  or  knowledge  items  were  likely  to 
be  very  useful  predictors  of  performance  in 
training  for  cyber-related  jobs. 

Logic-based  reasoning  (LBR)  items  assess 
inductive  or  deductive  reasoning  skills  by  pre¬ 
senting  examinees  with  a  premise  or  set  of 
premises  and  asking  them  to  choose  the  one 
valid  conclusion  among  a  series  of  conclusions 
(Colberg,  Nester,  &  Trattner,  1985).  Although 
LBR  did  not  appear  among  the  critical  cyber/IT 
KSAOs,  they  were  included  because  military 
cyber/IT  SMEs  indicated  they  believed  reason¬ 
ing  ability  to  be  an  important  determinant  of  job 
performance.  We  thought  LBR  items  might  be  a 
useful  way  to  assess  reasoning  skills  needed  for 
cyber-related  jobs. 

Deductive  LBR  items  are  essentially  formal 
syllogisms  placed  in  the  scaffolding  of  a  tradi- 


^  The  CAT-ASVAB  is  a  computerized  adaptive  testing 
platform  for  administering  the  ASVAB  at  the  Military  En¬ 
trance  Processing  Stations  (MBPS). 
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tional  verbal  reasoning  test  item.  Inductive  LBR 
items  are  similar  in  structure,  but  rely  on  prob¬ 
abilistic  rather  than  necessary  premises  and  con¬ 
clusions.  The  LBR  items  were  expected  to  show 
a  small  amount  of  incremental  validity  when 
used  in  combination  with  the  ASVAB  as  both 
assess  general  mental  ability  (g)  (Stauffer,  Ree, 
&  Carretta,  1996). 

Biodata  items  (Stokes,  Mumford,  &  Owens, 
1994)  are  based  on  the  notion  that  the  best  indi¬ 
cator  of  future  performance  is  past  performance 
(Wemimont  &  Cambpell,  1968).  Such  items  as¬ 
sess  biographical  information  relevant  to  job  per¬ 
formance.  Past  research  has  indicated  that  well- 
constructed  biodata  measures  can  exhibit  good 
levels  of  criterion-related  validity  (e.g.,  Carlson, 
Scullen,  Schmidt,  Rothstein,  &  Erwin,  1999; 
Rothstein,  Schmidt,  Erwin,  Owens,  &  Sparks, 
1990;  Schmidt  &  Hunter,  1998)  and  small  sub¬ 
group  differences  (e.g.,  Reilly  &  Chao,  1982).  The 
main  drawback  with  biodata  items  is  that  they 
could  be  subject  to  response  distortion  when  ap- 
pMcants  are  seeking  highly  valued  occupations. 
Eurther,  biographical  data  measures  have  been 
shown  to  demonstrate  little  incremental  vahdity 
for  predicting  training  and  job  performance  when 
used  in  combination  with  measures  of  general 
mental  ability  (Schmidt  &  Hunter,  1998).  Even  so, 
we  chose  to  develop  biodata  items  because  they 
are  efficient  to  administer,  inexpensive  to  develop, 
and  offer  a  very  different  methodology. 

Einally,  SMEs  had  emphasized  the  importance 
of  reasoning  ability.  Many  talked  about  the  ability 
to  solve  puzzles  like  Sudoku  as  occupationally 
relevant.  To  evaluate  nonverbal  reasoning  ability, 
we  administered  a  Eigural  Reasoning  (ER)  assess¬ 
ment.  ER  was  previously  used  in  the  Army’s  Proj¬ 
ect  A  (Russell,  Peterson,  Rosse,  Hatten,  McHenry, 
&  Houston,  2001)  and  the  Enhanced  Computer 
Adaptive  Test  (ECAT)  project  (Alderton,  Wolfe, 
&  Larson,  1997).  Administering  a  nonverbal  rea¬ 
soning  test  would  allow  us  to  estimate  how  well 
such  a  measure  would  work  for  cyber/IT  jobs.  As 
with  the  LBR  items,  the  ER  test  as  a  measure  of 
nonverbal  reasoning  was  expected  to  demonstrate 
a  small  amount  of  incremental  vahdity  when  used 
in  combination  with  the  ASVAB,  as  both  measure 
g  (Stauffer  et  al.,  1996). 

Although  a  traditional  multiple  choice  format 
was  used  for  most  of  the  information  or  knowl¬ 
edge  and  LBR  items  (75%),  some  were  developed 
using  nontraditional  formats  (e.g.,  multiple  re¬ 
sponse,  matching).  The  main  advantages  of  non¬ 


traditional  items  are  that  they  add  face  validity  and 
variety  for  examinees,  and  are  expected  to  result 
in  less  guessing.  However,  it  was  recognized  that 
these  item  formats  would  be  difficult  to  integrate 
into  the  CAT-ASVAB  system.^ 

The  number  of  items  developed  by  knowl¬ 
edge  area  was  based  on  discussions  with  SMEs 
about  which  content  areas  best  reflected  entry- 
level  training  requirements.  The  initial  item 
pool  had  219  items;  162  knowledge/informa¬ 
tion,  43  logic,  and  14  biodata  items^  (Russell  & 
Sellman,  2008a).  Eollowing  the  technical  and 
sensitivity  reviews,  several  items  were  edited 
and  those  thought  to  be  too  difficult  were  re¬ 
placed  with  easier  items.  The  final  item  pool 
consisted  of  206  items:  148  knowledge  or  in¬ 
formation,  44  logic,  and  14  biodata  items.  Eig¬ 
ural  Reasoning  was  not  included  in  the  pilot  test 
stage  because  it  had  been  through  rigorous  de¬ 
velopment  and  review  in  the  Army’s  Project  A 
(Russell  et  ah,  2001)  and  the  ECAT  project 
(Alderton  et  al.,  1997). 

Pilot  test  procedures,  data  processing,  and 
sample  demographics.  Eour  forms  of  the  CT 
were  developed  to  minimize  the  effects  of  fa¬ 
tigue  and  item  order  on  psychometric  results. 
Each  version  included  all  of  the  items,  but  the 
items  were  presented  in  different  orders. 

The  pilot  test  sample  consisted  of  684  exam¬ 
inees  from  two  groups;  586  U.S.  Air  Eorce 
Basic  Recruits  at  Lackland  AEB,  TX,  and  98 
U.S.  Navy  trainees  attending  a  Cryptologic 
Technician  Networks  (CTN)  course  at  Pensa¬ 
cola,  PL.  The  USAP  sample  contained  a  higher 
proportion  of  women  than  did  the  Navy  sample 
(29.7%  vs.  18.4%).  The  two  groups  were  simi¬ 
lar  in  race  or  ethnic  representation  with  about 
79%  White  and  89%  non-Hispanic  in  each 


’  Nontraditional  formats  such  as  multiple  response  or 
matching  may  violate  assumptions  of  the  item  response 
theory  (IRT)  model  used  in  CAT-ASVAB,  such  as  local 
independence.  Polytomous  scored  items  also  present  a  chal¬ 
lenge  for  integration  with  CAT-ASVAB  that  uses  only 
dichotomously  scored  items  using  the  three  parameter  lo¬ 
gistic  model  (3PL;  Lord  &  Novick,  1968).  IRT  models 
appropriate  for  polytomously  scored  items  (e.g.,  Muraki, 
1997)  are  available,  and  mixing  of  models  is  not  problem¬ 
atic  within  the  IRT  framework.  Nevertheless,  the  current 
CAT-ASVAB  infrastructure  is  configured  to  work  with  the 
3PL  model  only,  and  revising  it  to  include  other  models 
would  require  substantial  changes  to  the  cuiTent  system. 

The  14  biodata  items  were  multiple  response  format, 
representing  79  discrete  items. 
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group.  Although  the  U.S.  Army  showed  some 
interest  in  the  cyber/IT  test,  other  research  pri¬ 
orities  precluded  their  involvement  in  the  pilot 
study. 

Biodata  results.  Examination  of  re¬ 
sponses  to  biodata  items  indicated  that  both 
the  USAF  Basic  Recruits  and  Navy  CTN  trainees 
used  computers  and  information  technology  in 
their  daily  lives.  Common  activities  included  in¬ 
stant  messaging,  playing  Internet  games,  partic¬ 
ipating  in  virtual  environments,  and  download¬ 
ing  or  listening  to  podcasts.  They  also  were 
knowledgeable  about  computer  operations  (e.g., 
set  up  wired  or  wireless  home  network,  set 
up/install/upgrade  operating  system  on  a  home 
PC,  and  scan  for/remove  viruses).  The  Navy 
CTN  sample  was  more  experienced  than  the 
USAF  Basic  recruits  on  technical  computer  net¬ 
work  tasks  (e.g.,  set  up  wired  and  nonwired 
networks)  and  on  computer  programming  lan¬ 
guages.  This  was  not  surprising,  as  the  USAF 
Basics  represented  a  cross-section  of  technical 
training  specialties,  while  the  Navy  CTN  train¬ 
ees  were  already  assigned  to  an  IT-related  train¬ 
ing  course. 

Knowledge  and  logic  results.  A  major  ob¬ 
jective  of  this  project  was  to  use  the  pilot  test 
data  to  evaluate  test  items  and  assemble  al¬ 
ternate  test  forms  containing  a  subset  of  the 
items.  We  began  by  screening  all  192  cogni¬ 
tive  items  (148  knowledge  and  44  logic)  using 
Classical  Test  Theory  (CTT)  based  item  sta¬ 
tistics.  Items  were  flagged  based  on  propor¬ 
tional  p  values  and  item-total  correlations. 
Items  with  proportional  p  values  greater  than 
.80  were  flagged  as  “easy”  and  those  with 
values  less  than  .20  as  “hard.”  Items  with 
item-total  correlations  less  than  .20  were 
flagged  as  “weak.”  Although  this  information 
was  used  in  the  decision  process,  items  were 
not  necessarily  removed  because  they  were 
too  easy  or  hard  or  had  a  low  item-total  cor¬ 
relation.  Ninety-eight  items  (72  multiple 
choice  and  26  nontraditional)  survived  this 
initial  screening  process. 

Test  items  then  were  evaluated  based  on 
their  psychometric  characteristics  and  con¬ 
tent.  Three  pre-equated  knowledge  test  forms 
and  three  pre-equated  logic  test  forms  were 
assembled  to  be  parallel  with  respect  to  item 
discriminability,  difficulty,  and  content.  Inter¬ 
nal  consistency  reliabilities  ranged  from  .62 
to  .79  across  the  forms  and  samples.  Values 


of  this  magnitude  were  not  unexpected,  given 
the  range  of  content  and  the  relatively  small 
number  of  items. 

Sex  and  racial  group  mean  score  differences 
in  performance  favored  males  and  Whites. 
Male-female  mean  score  differences  were  gen¬ 
erally  small  {d  =  .27  to  .40)  by  Cohen’s  (1988) 
guidelines.  Although  White-Black  mean  score 
differences  were  large  {d  =  .93  to  .98),  they 
were  consistent  with  those  observed  in  other 
aptitude  measures  (Gottfredson,  2002;  Sackett, 
Schmitt,  Ellingson,  &  Kabin,  2001;  Schmidt  & 
Hunter,  1998)  and  for  the  ASVAB  tests  (Rus¬ 
sell,  Reynolds,  &  Campbell,  1994). 

Correlations  between  the  CT  knowledge  and 
logic  forms  and  ASVAB  scores  were  examined 
to  explore  relations  between  the  tests.  Analyses 
also  included  the  Armed  Forces  Qualification 
Test  (AFQT),  a  composite  of  the  four  ASVAB 
verbal  and  math  tests  (Arithmetic  Reasoning, 
Word  Knowledge,  Paragraph  Comprehension, 
and  Mathematics  Knowledge).  The  AFQT  is 
used  by  all  U.S.  military  services  for  enlistment 
qualification  and  is  an  indicator  of  g.  Correla¬ 
tions  were  corrected  for  multivariate  range  re¬ 
striction  (Fawley,  1943)  because  of  prior  selec¬ 
tion  on  the  ASVAB.  The  1997  national  profile 
of  American  youth  (PAY97;  Segall,  2004) 
served  as  the  reference  population  for  this  cor¬ 
rection.  After  correction,  the  CT  knowledge  and 
logic  test  forms  had  moderate  correlations  with 
the  ASVAB  tests.  Corrected  correlations  ranged 
from  .55  to  .77  between  the  AFQT  and  CT 
knowledge  forms  and  from  .53  to  .81  between 
the  AFQT  and  CT  logic  forms.  Among  the 
ASVAB  technical  knowledge  tests,  CT  scores 
correlated  most  strongly  with  General  Science 
(.56  to  .71  for  knowledge  forms,  .44  to  .64  for 
logic  forms). 

Correlations  between  average  CT  knowl¬ 
edge  and  logic  scores  and  biodata  items  re¬ 
vealed  several  moderate  relationships.  The 
strongest  relationships  occurred  between 
those  who  claimed  to  have  experience  work¬ 
ing  with  computer  hardware  (e.g.,  ordered 
computer  parts,  read  manufacturer  specifica¬ 
tions,  and  built  or  repaired  computers)  and 
cyber  knowledge.  Biodata  items  generally 
had  weak  relationships  with  ASVAB  scores, 
with  the  exception  of  the  Electronics  Infor¬ 
mation  (FI)  test.  The  FI  test  had  moderate 
relationships  with  many  of  the  same  items  to 
which  the  CT  was  related. 
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Technical  Training  School  Validation 

Predictive  Validity  Versus 
Final  School  Grades 

Russell  and  Sellman  (2010)  examined  the 
predictive  validity  of  the  CT  knowledge,  logic, 
and  biodata  measures  against  technical  training 
grades.  Six  Air  Force  technical  training  courses 
and  two  Navy  “A”  courses  were  included  in  the 
study.  All  Air  Force  occupations  were  cyber/IT- 
related  and  drawn  from  intelligence  and  com- 
munications-computer  functional  communities. 
Nearly  all  of  the  Air  Force  occupations  have 
since  been  reclassified  with  new  occupational 
titles  and  specialty  codes,  but  represent  substan¬ 
tial  coverage  of  what  are  now  considered  cyber 
warrior  occupations  (Scott,  Conley,  Mesic, 
O’Connell,  &  Medlin,  2010).  See  Table  3  for  a 
list  of  courses. 

The  predictor  battery  consisted  of  the  CT,  a 
biodata  measure,  and  Figural  Reasoning.  The 
tests  were  administered  to  students  at  the  begin¬ 
ning  of  technical  training.  Final  school  grades 
(FSGs)  were  collected  at  the  end  of  training  to 
serve  as  criteria  for  validating  the  measures.  In 
total,  1,127  students  had  both  predictor  data  and 
FSGs. 

Table  3  summarizes  the  observed  validities 
for  the  predictors.  Validity  coefficients  are  sum¬ 
marized  across  occupations  at  the  bottom  of  the 
table  with  sample  size  weighted  means.  For 
comparison  purposes.  Table  3  includes  the 
ASVAB  El  test.  El  was  a  part  of  the  selection 
composite  for  several  of  the  cyber-related  jobs. 


The  AEQT  had  the  highest  weighted  mean  va¬ 
lidity,  (.41)  followed  by  the  CT  (.37),  ER  (.25), 
El  (.22),  and  biodata  (.19).  The  CT  predicted 
FSGs  significantly  for  all  but  one  of  the  occu¬ 
pations  (Network  Intelligence  Analyst  -  1N4  X 
1).  Results  suggested  that  the  CT  measure  was  a 
better  predictor  than  El  that  is  currently  part  of 
composites  used  to  qualify  military  applicants 
for  many  of  the  cyber/IT  occupations. 

Table  4  summarizes  the  validities  of  the  pre¬ 
dictors  after  multivariate  correction  for  range 
restriction  (Lawley,  1943)  to  the  military  en¬ 
listed  applicant  sample.  All  validities  increased 
in  magnitude  after  correction.  The  AFQT  (.73) 
and  CT  (.64)  had  the  highest  mean  validities  for 
the  eight  courses. 

CT  Incremental  Validity  Versus 
Final  School  Grades 

The  AFQT  was  used  as  a  baseline  (ob¬ 
served  r  =  .41)  by  which  to  evaluate  the 
incremental  validity  of  the  other  measures  for 
predicting  FSGs.  The  CT  showed  a  small 
amount  of  incremental  validity  when  used  in 
combination  with  the  AFQT  and  compared 
favorably  with  the  other  measures  (El,  FR, 
and  biodata).  For  the  observed  correlations, 
the  weighted  mean  incremental  validities  for 
the  eight  courses  were;  CT  (.051),  El  (.031), 
FR  (.012),  and  biodata  (.008).  After  correc¬ 
tion  for  multivariate  range  restriction  on  the 
ASVAB,  the  weighted  mean  incremental  va¬ 
lidities  for  the  eight  courses  were:  CT  (.022), 
El  (.016),  FR  (.006),  and  biodata  (.006). 


Table  3 

Observed  Validity  Estimates  by  Course 


Correlation  with  final  school  grade 


Service/course 

N 

AFQT 

El 

Biodata 

CT 

FR 

Air  Force 

1N4  X  1-Network  Intelligence  Analysis 

79 

.25* 

.26* 

.16 

.15* 

.04 

2E1  X  1-Satellite  Wideband  Telemetry 

138 

.37** 

.24** 

.13 

.34** 

27*. 

2E1  X  3-Ground  Radio  Communication 

170 

.54** 

.12 

.10 

.43** 

.31** 

2E2  X  1-Communications,  Network,  Switch,  and  Crypto  Systems 

161 

.33* 

.34** 

.03 

.43** 

.21** 

3C0  X  1-Communications-Computer  Systems  Operations 

188 

.44** 

.29** 

.30** 

.46** 

.20** 

3C2  X  1-Communications-Computer  Systems  Controller 

147 

.47** 

.18* 

.27** 

.35** 

.23** 

Navy 

Information  Systems  Technician  (IT) 

183 

.37** 

.21** 

.15* 

.31** 

.17 

Crypotologic  Technician-Networks  (CTN) 

61 

.35** 

.07 

.10 

.34** 

.22 

Weighted  mean 

1,126 

.41 

.22 

.19 

.37 

.25 

*p  <  .05.  **p  <  .01. 


This  document  is  copyrighted  by  the  American  Psychological  Association  or  one  of  its  allied  publishers. 
This  article  is  intended  solely  for  the  personal  use  of  the  individual  user  and  is  not  to  be  disseminated  broadly. 


190 


TRIPPE,  MORIARTY,  RUSSELL,  CARRETTA,  AND  BEATTY 


Table  4 

Validity  Estimates  Corrected  for  Multivariate  Range  Restriction  by  Course 


Correlation  with  final  school  grade 


Service/course 

N 

AEQT 

El 

Biodata 

CT 

FR 

Air  Force 

1N4  X  1-Network  Intelligence  Analysis 

79 

.61 

.48 

.36 

.46 

.14 

2E1  X  1-Satellite  Wideband  Telemetry 

138 

.72 

.58 

.36 

.66 

.44 

2E1  X  3-Ground  Radio  Communication 

170 

.82 

.59 

.27 

.77 

.53 

2E2  X  1-Communications,  Network,  Switch,  and  Crypto  Systems 

161 

.73 

.68 

.31 

.74 

.40 

3C0  X  1-Communications-Computer  Systems  Operations 

188 

.73 

.53 

.48 

.69 

.49 

3C2  X  1-Communications-Computer  Systems  Controller 

147 

.65 

.38 

.35 

.48 

.42 

Navy 

Information  Systems  Technician  (IT) 

183 

.76 

.52 

.50 

.61 

.47 

Crypotologic  Technician-Networks  (CTN) 

61 

.69 

.28 

.14 

.53 

.54 

Weighted  mean 

1,127 

.73 

.53 

.37 

.64 

.45 

A  couple  of  issues  should  be  kept  in  mind 
when  evaluating  the  incremental  validities. 
First,  the  incremental  validity  analyses  do  not 
reflect  the  way  the  ASVAB  is  used  operation¬ 
ally.  Incremental  validity  analyses  address  how 
much  additional  prediction  the  new  test  would 
provide  if  the  AFQT  were  used  optimally  (i.e., 
as  a  top-down  selection  tool,  not  as  a  dichoto¬ 
mized  score).  Because  the  AFQT  is  not  used 
optimally,  the  incremental  validity  estimates  are 
conservative  and  may  underestimate  the  actual 
selection  efficiency  of  the  CT  and  other  mea¬ 
sures. 

Regardless,  incremental  validity  is  an  index 
the  Services  have  used  to  evaluate  new  pre¬ 
dictors  for  many  years.  It  should  be  noted  that 
the  estimates  reported  here  are  similar  to 
those  for  the  ASVAB  technical  knowledge 
tests  (General  Science,  Mechanical  Compre¬ 
hension,  El,  and  Auto  and  Shop  Information) 
that  are  independent  of  the  AFQT.  For  exam¬ 
ple,  Oppler  et  al.  (1997)  reported  incremental 
validities  from  a  Joint-Service  study  that  in¬ 
cluded  13  technical  training  courses.  Validi¬ 
ties  for  each  ASVAB  test  were  computed 
using  only  the  training  courses  that  included 
that  test  in  their  composites.  Average  incre¬ 
mental  validity  estimates  beyond  the  AFQT, 
after  correction  for  multivariate  range  restric¬ 
tion,  ranged  from  .012  for  El  to  .034  for  Auto 
and  Shop  Information. 

Einally,  it  is  important  to  note  that  military 
research  suggests  that  even  small  validity  incre¬ 
ments  (e.g.,  .02)  can  have  utility  in  large  selec¬ 
tion  programs  (Held,  Fedak,  Crookenden,  & 
Blanco,  2002;  Schmidt,  Dunn,  &  Hunter,  1995). 


Additional  Navy  Training 
School  Validation 

Near  the  conclusion  of  the  original  training 
school  validation  study,  the  Navy  significantly 
altered  training  in  the  Crypotologic  Technician- 
Networks  (CTN)  course.  Therefore,  the  Navy 
wanted  to  know  whether  the  CT  was  a  signifi¬ 
cant  predictor  of  performance  in  the  new  course 
format.  An  additional  sample  of  1 18  CTN  train¬ 
ees  completed  the  CT  predictor  battery  during 
their  first  week  of  training  in  the  revised  course 
format.  Two  criterion  variables  were  available 
for  the  validation  analyses — grade  point  aver¬ 
age  (GPA)  and  graduation  status  (pass/fail). 
GPA  was  the  average  score  computed  from  19 
course  modules.  Only  individuals  who  ulti¬ 
mately  passed  the  course  had  a  reported  final 
GPA.  Nevertheless,  course  module  scores  were 
available  for  nongraduates  up  to  the  point  of 
failure.  That  is,  students  continued  in  the  course 
until  they  scored  below  70%  on  a  module.  At 
that  point  the  academic  review  board  deter¬ 
mined  if  the  student  should  be  dropped  from  the 
class.  Students  had  to  maintain  a  course  average 
of  75%  or  higher  and  pass  all  module  tests  by 
scoring  70%  or  better.  Because  validating  the 
CT  against  GPAs  of  only  the  successful  candi¬ 
dates  would  restrict  the  variance  in  criterion 
scores,  we  imputed  GPAs  for  all  students  using 
the  average  of  the  course  module  scores  avail¬ 
able. 

Table  5  contains  multiple  correlation  values 
that  resulted  from  the  regression  of  GPA  on 
existing  ASVAB  composite  predictors  and  the 
CT.  It  should  be  noted  that  Navy  personnel  can 
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Table  5 

Validity  and  Incremental  Validity  of  CT  in  Predicting  GPA 


Predictor(s) 

Observed 

CoiTected 

Reported  GPA 

Imputed  GPA 

Reported  GPA 

Imputed  GPA 

n  = 

76 

n  = 

118 

n 

=  76 

n  = 

118 

R 

AR 

R 

AR 

p 

Ap 

p 

Ap 

Cyber  Test  (CT) 

.39** 

— 

.46” 

— 

.66 

— 

.69 

— 

AFQT 

.41** 

.40” 

.80 

.79 

AFQT  -1-  CT 

.49” 

.08” 

.52” 

.12” 

.81 

.01 

.82 

.02 

Composite 

.45” 

.44” 

.82 

.81 

Compl  +  CT 

.52” 

.07” 

.55” 

.11” 

.84 

.02 

.84 

.03 

Composite  2^ 

.44” 

.45” 

.81 

.81 

Comp2  +  CT 

.50” 

.06” 

.54” 

.09” 

.82 

.01 

.82 

.02 

Note,  p  indicates  coefficients  that  were  corrected  for  multivaiiate  range  restriction  (Lawley,  1943). 
“  AR  +  2*MK  +  GS.  ”  VE  +  AR  +  MK  +  MC. 

"p  <  .01. 


qualify  for  CTN  training  on  either  of  two  com¬ 
posites  (Composite  1  =  AR  -f  2*MK  -I-  GS; 
Composite  2  =  VE  -I-  AR  +  MK  -I-  MC).  As  a 
result,  we  examined  the  incremental  validity  of 
the  CT  against  the  AFQT  and  each  of  the  Navy 
CTN  composites.  All  incremental  gains  in  the 
observed  multiple  correlation  values  were  sta¬ 
tistically  significant  at  the  .01  level.  Values  cor¬ 
rected  for  multivariate  range  restriction  were 
more  modest  than  the  observed  values.  This 
may  be  due  in  part  to  the  fact  that  the  ASVAB 
variances  were  adjusted  directly  to  the  popula¬ 
tion  values  (that  tends  to  result  in  a  larger  ad¬ 
justment)  whereas  the  CT  variances  were  indi¬ 
rectly  adjusted. 

The  second  training  criterion  was  gradua¬ 
tion  status  (pass/fail).  Results  of  the  logistic 
regression  analysis  in  which  graduation  status 
was  regressed  on  the  CT  score  alone  and  the 
CT  score  in  combination  with  the  ASVAB 
composite  predictors  are  found  in  Table  6. 
The  table  includes  Nagelkerke’s  (1991)  ad¬ 
justed  coefficient  of  determination  as  well  as 
the  value  for  each  model.  The  values  for 
all  models  were  statistically  significant  at  the 
.01  level.  The  increment  in  x^  associated  with 
the  CT  added  to  a  model  including  the  AFQT 
or  Composite  1  were  both  statistically  signif¬ 
icant  at  the  .05  level.  The  increment  in  chi- 
square  associated  with  the  model  adding  the 
CT  to  Composite  2  was  not  statistically  sig¬ 
nificant  (x^crit)  =  3.84,  p  <  .05).  Results  indi¬ 
cated  that  the  CT  had  significant  value  as  a 
predictor  of  performance  in  the  CTN  course 


and  provided  incremental  prediction  over  two 
of  the  three  ASVAB  composites. 

Testing  of  Military  Applicants  at  Military 
Entrance  Processing  Stations.  Subsequent 
studies  involved  data  collection  on  military 
applicants  tested  at  the  Military  Entrance  Pro¬ 
cessing  Stations  (MEPS).  The  objectives  of 
these  studies  were  to  (a)  estimate  psychomet¬ 
ric  properties  of  the  CT  items  in  an  applicant 
sample,  (b)  finalize  two  operational  forms,  (c) 
develop  norms  in  military  applicant  samples, 
(d)  further  examine  the  relations  between  the 
CT  and  ASVAB  tests,  and  (e)  initiate  longi¬ 
tudinal  predictive  validation  studies. 

To  prepare  for  MEPS  testing,  new  CT  items 
were  generated  and  four  forms  of  the  test  were 
developed.  In  addition  to  entirely  new  items 
being  written,  some  previous  nontraditional  for¬ 
mat  items  were  converted  to  the  multiple  choice 
format.  Biodata  items  were  eliminated  from  all 


Table  6 

Logistic  Regression  Results 


Predictor 

RfNag) 

^R-?Nag} 

x" 

Ax" 

Cyber  Test  (CT) 

.11 

— 

9.84 

— 

AFQT 

.10 

— 

8.95 

— 

AFQT  -1-  CT 

.15 

.05 

13.78 

4.82 

Composite 

.15 

— 

13.76 

— 

Compl  +  CT 

.20 

.05 

18.55 

4.79 

Composite  2^ 

.16 

— 

14.49 

— 

Comp2  +  CT 

.19 

.03 

17.77 

3.29 

Note,  n  =  16  graduates  and  n  =  42  nongraduates. 
“  AR  +  2*MK  3-  GS.  ”  VE  +  AR  +  MK  +  MC. 
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CT  forms  over  concerns  of  potential  response 
distortion  as  well  as  generally  low  predictive 
validity.  Each  CT  form  included  26  anchor 
items  and  14  unique  items  and  used  the  same 
content  specihcations.  This  was  done  to  pro¬ 
duce  tests  of  similar  length  to  the  current 
ASVAB  technical  knowledge  tests  and  to  col¬ 
lect  item-level  data  on  a  large  set  of  items.  The 
test  plan  was  to  administer  the  four  CT  forms  to 
a  combined  sample  of  50,000  Air  Force,  Army, 
and  Navy  applicants.  The  large  sample  sizes 
were  needed  to  enable  subgroup  analyses  on  the 
four  CT  forms. 

One  of  the  objectives  of  the  MBPS  admin¬ 
istration  was  to  rehne  the  available  item  pool 
based  on  a  large-scale  applicant  administra¬ 
tion.  The  majority  of  items  that  comprised  the 
forms  administered  at  the  MBPS  had  been 
pilot-tested  on  relatively  smaller  samples  of 
Air  Force  or  Navy  recruits  who  had  already 
passed  several  selection  hurdles.  We  expected 
to  remove  some  of  the  items  from  the  pool  for 
psychometric  reasons  (e.g.,  inappropriate  dif- 
hculty  level,  low  item-total  score  correla¬ 
tions,  and  large  subgroup  differences)  or  be¬ 
cause  of  flaws  in  experimental  items  that 
would  only  be  revealed  after  pilot  testing. 
Four  items  were  removed  from  the  pool  be¬ 
cause  post  hoc  SME  review  of  the  item  con¬ 
tent  in  the  context  of  the  psychometric  infor¬ 
mation  revealed  item  flaws  such  as 
misleading  language  or  more  than  one  re¬ 
sponse  option  that  could  be  considered  cor¬ 
rect.  Twelve  items  were  removed  because 
they  did  not  perform  well  in  the  applicant 
population.  That  is,  some  items  had  low  or 
negative  item-total  correlations,  extremely 
high  or  low  p  values,  or  poorly  calibrated  item 
response  theory  (IRT)  parameters  (i.e.,  ex¬ 
treme  or  out  of  bounds  values)  despite  the 
absence  of  any  apparent  flaw  in  the  item  con¬ 
tent.  Of  those  items  removed  from  the  item 
pool,  the  majority  were  removed  for  being  too 
difficult  in  the  applicant  sample. 

The  final  CT  item  pool  was  calibrated  and 
analyzed  using  an  IRT  measurement  model 
known  as  the  Three  Parameter  Fogistic  Model 
(3PF)  (Ford,  1980;  Ford  &  Novick,  1968).  In 
essence,  IRT  assumes  that  test  item  responses 
by  examinees  are  the  result  of  underlying 
levels  of  ability  possessed  by  those  individu¬ 
als.  IRT  provides  a  seamless  approach  to  a 
variety  of  test  analysis,  development,  and  re¬ 


porting  activities  and  is  facilitated  by  fitting, 
or  calibrating,  statistical  models  to  examinee 
responses.  Application  of  these  statistical 
models  results  in  the  simultaneous  scaling  of 
item  difficulty  and  examinee  (population) 
ability.  Calibration  was  executed  via  the  soft¬ 
ware  program  MUFTIFOG  (Thissen,  2003). 

Another  goal  of  the  MBPS  administration 
was  to  construct  two  operational  forms  from 
the  items  that  comprised  the  experimental 
forms  and  develop  a  reporting  metric.  The 
target  length  of  the  two  operational  forms  was 
30  items  with  no  overlap.  We  began  the  form 
assembly  effort  with  the  65-item  pool  re¬ 
tained  from  the  82  unique  items  administered 
to  the  applicant  sample.  The  resulting  forms 
needed  to  be  balanced  with  respect  to  (a)  item 
content,  (b)  item  subcontent,  (c)  difficulty,  (d) 
discrimination,  (e)  reliability,  and  (f)  keyed 
responses.  We  also  needed  to  consider  item 
“enemies”  (i.e.,  items  that  assess  identical  or 
highly  similar  content)  when  making  form 
assignments.  To  determine  the  optimal  as¬ 
signment  of  items  to  forms  to  balance  the 
competing  test  specifications,  we  utilized  Au¬ 
tomated  Test  Assembly  (ATA;  van  der  Fin- 
den,  2005).  The  final  operational  forms  con¬ 
tained  29  items  each,  one  short  of  the  original 
goal  of  30  items  per  form.  The  29-item  solu¬ 
tion  resulted  in  the  best  balance  of  content, 
difficulty,  discrimination,  and  reliability 
across  the  two  forms.  The  inclusion  of  addi¬ 
tional  items  upset  the  balance  at  a  cost  that  we 
felt  was  greater  than  any  benefit  achieved  in 
reliability  or  information. 

Subgroup  Norms 

Standardized  mean  difference  comparisons 
were  computed  across  five  subgroups:  males 
{n  =  39,951),  females  (n  =  11,859),  non- 
Hispanic  Blacks  {n  =  7,524),  non-Hispanic 
Whites  (n  =  25,607),  and  Hispanic  Whites  (n  = 
5,251).  These  groups  were  chosen  to  be  consis¬ 
tent  with  designations  used  by  the  ASVAB  test¬ 
ing  program  (Defense  Manpower  Data  Center, 
2011).  Results  for  the  CT,  several  ASVAB 
tests,  and  the  AFQT  are  found  in  Table  7.  The 
CT  had  smaller  standardized  mean  differences 
than  the  ASVAB  technical  knowledge  tests  in 
the  male-female  comparison.  Male-female  dif¬ 
ferences  were  larger  in  the  CT  than  in  Assem- 
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Table  7 

Standardized  Subgroup  Mean  Dijferences  of  the  Cyber  Test  and  ASVAB 
Technical  Tests  in  Applicant  Sample 


Test 

^ 

(male— female) 

“(While-Black) 

“(While-Hispaiiic) 

Cyber  Test  (CT) 

0.44 

0.55 

0.36 

Armed  Forces  Qualihcation  Test  (AFQT) 

0.30 

0.81 

0.48 

Assembling  Objects  (AO) 

0.19 

0.59 

0.14 

Auto  and  Shop  (AS) 

1.05 

1.14 

0.62 

General  Science  (GS) 

0.56 

0.99 

0.61 

Electronics  Information  (El) 

0.83 

1.00 

0.60 

Mechanical  Comprehension  (MC) 

0.82 

1.09 

0.55 

“Male  (n  =  39,951)  vs.  female  (n  =  11,859).  Non-Hispanic  White  {n  =  25,607)  vs. 
Non-Hispanic  Black  («  =  7,524).  “  Non-Hispanic  White  (n  =  7,524)  vs.  Hispanic  White 

(n  =  5,251). 


bling  Objects^  (AO)  or  the  AFQT.  Differences 
between  non-Hispanic  Whites  and  non-His- 
panic  Blacks  were  smaller  for  the  CT  than  any 
of  the  other  technical  knowledge  tests.  Simi¬ 
larly,  differences  in  the  non-Hispanic  White 
versus  Hispanic  White  comparison  were 
smaller  in  the  CT  than  in  any  other  test  with  the 
exception  of  AO. 

Construct  Validity 

To  evaluate  the  construct  validity  of  the  CT, 
we  tested  a  series  of  Confirmatory  Factor  Anal¬ 
ysis  (CFA)  models  depicted  in  Figures  1 
through  3  using  the  technical  training  school 
validation  sample.  We  tested  highly  similar 
models  in  the  MBPS  applicant  sample  and  ob¬ 
tained  comparable  results,  but  present  the  train¬ 
ing  school  modeling  results  here  because  of  the 
availability  of  multiple  nonverbal  reasoning 
variables  in  the  training  school  sample.  Model  1 
is  based  on  prior  factor  analytic  work  on  the 
ASVAB  (Kass,  Mitchell,  Grafton,  &  Wing, 
1983)  and  serves  as  a  benchmark  or  baseline 
with  which  to  compare  subsequent  models  in¬ 
cluding  the  CT.  Observed  variables  in  Model  1 
included  the  nine  ASVAB  tests  and  the  FR  test 
administered  with  the  CT.  The  four  hypothe¬ 
sized  latent  variables  in  Model  1  were  factors 
representing  Quantitative  (QUANT),  Verbal 
(VERBAL),  Technical  Knowledge  (TECH), 
and  Non-Verbal  Reasoning  (NVR).  Model  2 
added  the  CT  as  an  observed  variable  hypothe¬ 
sized  to  load  on  the  technical  factor.  The  CT  is 
conceptually  similar  to  the  other  technical 
knowledge  tests  (General  Science,  Auto-Shop, 
El,  and  Mechanical  Comprehension)  in  that  it 


represents  an  information  test  designed  to  assess 
knowledge  and  aptitude  in  a  technical  domain. 
Model  3  was  a  revision  to  Model  2,  in  which  the 
CT  was  hypothesized  to  load  on  both  the  Tech¬ 
nical  and  Verbal  factors.  Kass  et  al.  (1983) 
found  the  General  Science  test  to  load  on  both 
technical  and  verbal  factors.  The  CT  is  similar 
to  GS  in  that  its  reading  requirements  are  rela¬ 
tively  more  demanding  than  for  the  quantitative 
or  other  technical  knowledge  tests. 

Table  8  summarizes  the  fit  indices  for  the 
models.  The  value  associated  with  each 
model  was  statistically  significant,  indicating 
poor  model  fit.  However,  the  test  is  not 
generally  relied  on  as  an  index  of  overall  model 
fit  in  models  tested  on  samples  larger  than  200. 
CEI  and  TLI/NNEI  values  above  .95,  RMR 
values  below  .05,  and  RMSEA  values  below  .08 
are  generally  indicative  of  good  model  fit 
(Kenny,  2009).  CEI,  TLI/NNEI,  and  RMR  val¬ 
ues  all  suggested  that  Models  1-3  exhibited 
good  fit.  The  RMSEA  index  suggested  poor 
model  fit.  The  higher  than  desirable  RMSEA 
value  was  likely  due  in  part  to  that  index’s  sensi¬ 
tivity  to  the  ratio  of  parameters  to  degrees  of 
freedom.  Given  the  complexity  of  Models  1-3,  it 
is  reasonable  to  conclude  that  their  fit  to  the  data 
are  within  the  acceptable  range.  Models  2  and  3 
are  nested  and  thus,  their  relative  fit  can  be  directly 
compared  via  the  change  in  x^  value.  Model  3  fits 


^  The  Assembling  Objects  test  is  a  nonverbal  reasoning 
test  that  requires  examinees  to  determine  how  an  object  will 
appear  when  its  parts  are  put  together. 


This  document  is  copyrighted  by  the  American  Psychological  Association  or  one  of  its  allied  publishers. 
This  article  is  intended  solely  for  the  personal  use  of  the  individual  user  and  is  not  to  be  disseminated  broadly. 


194 


TRIPPE,  MORIARTY,  RUSSELL,  CARRETTA,  AND  BEATTY 


Figure  1.  Confirmatory  factor  analysis  Model  1.  The  tests  were  Arithmetic  Reasoning 
(AR),  Math  Knowledge  (MK),  Word  Knowledge  (WK),  Paragraph  Comprehension  (PC), 
General  Science  (GS),  Electronics  Information  (El),  Auto  and  Shop  Information  (AS), 
Mechanical  Comprehension  (MC),  Assembling  Objects  (AO),  and  Figural  Reasoning 
(FR).  The  factors  were  Quantitative  (QUANT),  Verbal  (VERBAL),  Technical  Knowledge 
(TECH),  and  Non-Verbal  Reasoning  (NVR).  See  the  online  article  for  the  color  version 
of  this  figure. 


the  data  significantly  better  than  Model  2  (x^  = 
78.88,  df  =  I,  p  <  .01),  suggesting  that  both  the 
technical  knowledge  and  verbal  factors  contrib¬ 
uted  significantly  to  the  CT. 

Discussion 

Cyberspace  is  both  an  established  and  emerging 
national  security  front  (Smart,  2011).  As  this  fact 
becomes  increasingly  apparent  as  critical  to  na¬ 
tional  defense,  we  will  undoubtedly  observe  a 
concomitant  demand  to  select,  classify,  and  train 
cyber  warriors.  Indeed,  shortages  of  cyber  security 
personnel  are  being  reported  in  the  military  and 
federal  agencies  (Beidel  &  Magnuson,  2011).  Al¬ 
though  there  is  no  single  solution  to  address  gaps 
in  cyber  knowledge  and  available  cyber  personnel 
within  the  Services,  one  way  to  address  shortages 
and  confront  emerging  threats  is  to  begin  identi¬ 


fying  applicants  most  likely  to  succeed  in  cyber- 
related  training.  Expertise  takes  years  to  develop. 
The  development  of  methods  to  assess  suitabihty 
for  cyber/IT  career  fields  is  only  a  first  step. 

Development  and  analysis  of  the  CT  is  on¬ 
going.  The  large-scale  MBPS  administration  of 
the  CT  will  serve  as  the  foundation  for  the 
evaluation  of  new  item  pools  and  longitudinal 
validation  studies.  Nevertheless,  the  cumulative 
research  to  date  has  been  sufficient  to  convince 
policymakers  to  begin  preliminary  operational 
use  of  the  CT. 

The  current  status  of  the  CT  is  as  a  special 
test  to  be  administered  in  static  form  on  the 
CAT-ASVAB  platform.  In  2011,  the  Services 
formed  a  CT  working  group  to  address  imple¬ 
mentation  issues.  These  include:  (a)  test  main¬ 
tenance  (e.g.,  review  of  item  specifications,  de¬ 
velopment  of  expanded  item  pool,  and 
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Figure  2.  Confirmatory  factor  analysis  Model  2.  The  tests  \vere  Arithmetic  Reasoning  (AR), 
Math  Knowledge  (MK),  Word  Knowledge  (WK),  Paragraph  Comprehension  (PC),  General 
Science  (GS),  Cyber  Test  (CT),  Electronics  Information  (El),  Auto  and  Shop  Information 
(AS),  Mechanical  Comprehension  (MC),  Assembling  Objects  (AO),  and  Figural  Reasoning 
(FR).  The  factors  were  Quantitative  (QUANT),  Verbal  (VERBAL),  Technical  Knowledge 
(TECH),  and  Non-Verbal  Reasoning  (NVR).  See  the  online  article  for  the  color  version  of  this 
figure. 


evaluation  of  item  obsolescence),  (b)  identifica¬ 
tion  of  resources  (funding,  cyber/IT  SMEs),  (c) 
determination  of  frequency  of  planned  updates, 
and  (d)  the  development  of  Service  ASVAB/CT 
composites. 

The  Services  are  currently  ready  to  use  the  CT 
measure  as  a  special  test  administered  to  a  limited 
number  of  applicants  who  may  expand  the  pool  of 
available  qualified  applicants.  In  June  2014,  the 
Air  Force  began  operational  use  of  the  CT.  Their 
model  expands  the  quahfied  applicant  pool  for 
those  who  are  five  or  fewer  percentile  points  be¬ 
low  existing  cut  scores  for  qualifying  into  cyber 
occupations.  Those  who  score  high  enough  on  the 
CT  (standard  score  &60)  to  compensate  for  miss¬ 
ing  the  existing  cut  scores  are  added  to  the  pool  of 
qualified  applicants.  Additional  work  in  the  areas 


of  composite  formation  and  standard  setting  are 
underway.  More  specifically,  we  are  examining 
the  predictive  validity  of  predictor  composites  that 
combine  and  weight  the  CT  measure  with  other 
ASVAB  tests,  and  measures  of  personality  (Car- 
retta  &  Manley,  2014)  to  achieve  specific  goals 
(e.g.,  maximize  predictive  validity,  minimize  ad¬ 
verse  impact).  We  are  also  continuing  to  explore 
standard  setting  in  the  context  of  compensatory 
predictive  models,  like  the  one  described  above, 
such  that  cut  scores  optimize  policy  objectives 
(e.g.,  success  in  training,  diversity). 

The  next  phase  of  CT  development  will  be  to 
migrate  the  static  test  forms  to  an  operational 
item  pool  suitable  for  computer  adaptive  testing 
(CAT).  The  existing  item  pool  is  relatively 
small  in  comparison  to  that  of  a  CAT-ASVAB 
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Figure  3.  Confirmatory  factor  analysis  Model  3.  The  tests  \vere  Arithmetic  Reasoning  (AR), 
Math  Knowledge  (MK),  Word  Knowledge  (WK),  Paragraph  Comprehension  (PC),  General 
Science  (GS),  Cyber  Test  (CT),  Electronics  Information  (El),  Auto  and  Shop  Information 
(AS),  Mechanical  Comprehension  (MC),  Assembling  Objects  (AO),  and  Figural  Reasoning 
(FR).  The  factors  were  Quantitative  (QUANT),  Verbal  (VERBAL),  Technical  Knowledge 
(TECH),  and  Non-Verbal  Reasoning  (NVR).  See  the  online  article  for  the  color  version  of  this 
figure. 


test  and  generally  more  subject  to  content  ob¬ 
solescence.  Moreover,  test  “information”^  is 
concentrated  at  the  higher  end  of  the  ability 
distribution  such  that  the  test  is  relatively  pre¬ 
cise  around  the  existing  cut  score,  but  relatively 
imprecise  toward  the  middle  and  lower  end  of 
the  ability  distribution.  Development  efforts 


Table  8 

Fit  Indices  for  CFA  Models  1  Through  3 


Model 

df 

X' 

CFI 

TL/NNFI 

RMSEA 

RMR 

1 

28 

382.56 

0.9635 

0.9414 

0.1046 

0.04941 

2 

37 

558.06 

0.9582 

0.9378 

0.1118 

0.05391 

3 

36 

482.18 

0.9642 

0.9453 

0.1031 

0.04962 

Note.  Models  2  and  3  are  nested.  Sample  size  with  com¬ 
plete  data  for  all  observed  variables  was  1,193. 


will  focus  on  establishing  a  larger,  contempo¬ 
rary  item  pool  containing  items  that  provide 
information  along  the  entire  continuum  of  abil¬ 
ity.  This  kind  of  item  pool  is  necessary  to  sup¬ 
port  CAT  administration  and  to  maintain  proper 
item  exposure  controls  for  a  test  that  is  likely  to 
be  used  increasingly  for  selection  and  classifi¬ 
cation. 


^  Test  information  is  an  index  of  measurement  precision. 
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